align responses api payload to previous chat/completion#77
Conversation
…to align with previous payload of chat/completion; relevant tests changes Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor
…nker, deleted neural weights and unnecessary fields) Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com> Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com> Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor
cc7c1f6 to
beb3a1c
Compare
…search cases; related tests update Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com> Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor
ddf12c6 to
8420482
Compare
AI-assisted review: #77 — Responses API payload alignmentBlocking1. Tests vs implementation drift On
Align to one schema (suggest: 2. Export parity — PR #75 rules not exported This PR maps only
Without A or B, this fixes payload shape, not prompt parity. 3. Document HPO vs Responses contract
Add a brief comment in Suggestions
Merge order
|
|
Ad.1) Ad.2) How can I incorporate things from a not yet merged PR? Please propose the desired merging order Ad.3) I'm not really following. The only "contract" between HPO and Responses phases is that the Responses call should resemble as closely as possible the chat/completion requests made during HPO phase. It has been achieved based on the multiple dedicated jira efforts. The instruction also do match (to most possible extent). What exactly should I document then? As per the suggestions:
-
|
Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com> Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor
Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com> Assisted-by: Claude Code Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com> Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Co-authored-by: Jakub Walaszczyk <jwalaszc@redhat.com> Assisted-by: Claude Code Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
…entation publish script. Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com> Assisted-by: Claude Code Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
This reverts commit 5217636. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor
…#83) * fix(assets): merge HPO user prompt rules into Responses export system input Map Chat Completions prompts to responses_template without losing HPO-tuned rules or duplicating OGX file_search runtime text. Add build_responses_system_input() to merge non-redundant static supplements from user_message_text (RAG rules, answer length, language policy) into input[role=system], while stripping {reference_documents}, {question}, and citation instructions that OGX injects via annotation_prompt_params. Wire build_pattern_json() to use the merged system text (detected_language is still applied on generation before export). Add unit tests for export parity across all default model families and legacy prompt patterns. RHOAIENG-71231 Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * black formatting Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * ruff fix Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * fix(assets): add fallback for empty system input in Responses export Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * apply the feedback from Mateusz fix(assets): address PR review for Responses export system input merge Reorganize OGX runtime phrase lists into citation, grounding, and file-search groups with derived combined views. Tighten grounding deduplication so persona-only system prompts no longer suppress valid user supplements (e.g. Granite RAG block). Simplify OGX runtime stripping to a single pass, collapse whitespace after phrase removal, and remove unused citation helpers. Add unit tests for hybrid alpha=1.0 fallthrough, partial OGX sentence removal, explicit vs persona grounding detection, export-parity system input, and required generation fields. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * refactor(assets): address staff review on Responses export prompt mapping efactor(assets): address staff review on Responses export prompt mapping Remove unreachable citation dedup branch, simplify citation-line detection to use _CITATION_PREFIXES/_CITATION_SUBSTRINGS directly, and derive _system_has_grounding_policy from _GROUNDING_PREFIXES for a single source of truth. Rename _USER_RAG_SCAFFOLD_PREFIXES to _USER_RAG_GROUNDING_PREFIXES and document _join_answer_scaffold_blocks scope. Add direct tests for _is_placeholder_only_export and grounding-prefix coverage; remove duplicate empty-input test. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor --------- Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Claude Code Review:
|
| Gap | Severity |
|---|---|
temperature=None and max_completion_tokens=None in generation |
High |
tool_choice value not schema-validated against the actual API contract |
High |
_normalize_answer_scaffold with non-canonical argument order (e.g. "Answer (with citations, max 150 words):") |
Medium |
_system_has_grounding_policy false-positive: system prompt contains _GROUNDING_PREFIXES text as a non-leading embedded substring |
Medium |
_strip_ogx_runtime_instructions over-stripping: a phrase partially matching phrase.rstrip(".") embedded mid-sentence |
Medium |
ranker_alpha passed as integer 1 (not float 1.0) |
Low |
_filter_ogx_duplicative_sentences with a sentence ending in ? or ! |
Low |
build_pattern_json called with detected_language set — verifies that build_responses_system_input is called after detected_language is stored |
Low |
5. Actionable Fixes
Fix 1 — tool_choice schema (HIGH)
Rationale: Protects against a silent API contract violation.
# Current (line 460) — suspicious empty object in tools array:
"tool_choice": {"mode": "required", "tools": [{}], "type": "file_search"},
# Proposed — verify the actual API spec, then use explicit shape:
"tool_choice": {"type": "file_search"}, # or as per Responses API documentationFix 2 — _system_has_grounding_policy over-matching (HIGH)
Rationale: Prevents spurious suppression of user supplements when a system prompt mentions documents in a non-grounding context.
def _system_has_grounding_policy(system: str) -> bool:
sentences = [s.strip() for s in re.split(r"(?<=[.!?])\s+", system)]
return any(
any(sent.lower().startswith(p.lower()) for p in _GROUNDING_PREFIXES)
for sent in sentences
)Fix 3 — _strip_ogx_runtime_instructions double-replace risk (HIGH)
Rationale: Eliminates mid-sentence partial-match over-stripping.
for phrase in _SYSTEM_GROUNDING_PHRASES:
text = text.replace(phrase, "")
text = text.replace(phrase.rstrip("."), "")
text = _collapse_whitespace(text)Fix 4 — Guard None temperature/tokens (HIGH)
Rationale: Prevents sending null to the Responses API for optional fields.
# In build_pattern_json, after building the base template dict:
if generation.get("temperature") is not None:
pattern["settings"]["responses_template"]["temperature"] = generation["temperature"]
if generation.get("max_completion_tokens") is not None:
pattern["settings"]["responses_template"]["max_output_tokens"] = generation["max_completion_tokens"]Add corresponding test:
def test_omits_temperature_when_none():
pattern = _make_pattern()
pattern["settings"]["generation"]["temperature"] = None
build_pattern_json(pattern)
assert "temperature" not in pattern["settings"]["responses_template"]Fix 5 — _normalize_answer_scaffold regex (MEDIUM)
Rationale: Handles non-canonical comma order without producing orphaned punctuation.
def _normalize_answer_scaffold(line: str) -> str:
normalized = re.sub(r",?\s*with citations,?\s*", "", line)
return _collapse_whitespace(normalized)Fix 6 — ranker_alpha floating-point comparison (MEDIUM)
Rationale: Makes floating-point intent explicit and avoids future type-confusion bugs.
import math
elif (
search_mode == "hybrid"
and ranker_strategy == "weighted"
and ranker_alpha is not None
and not math.isclose(ranker_alpha, 1.0)
):Fix 7 — Test case label order (LOW)
File: test_pattern_builder.py
Renumber the inline comments to # Case 1, # Case 2, # Case 3, # Case 4 in source order inside test_build_responses_system_input_handles_empty_inputs.
Fix 8 — while "\n\n\n" loop (LOW)
Rationale: Consistent with the existing re.sub in _collapse_whitespace; handles any run of 3+ newlines in one pass.
# Replace (pattern_builder.py:194-195):
while "\n\n\n" in result:
result = result.replace("\n\n\n", "\n\n")
# With:
result = re.sub(r"\n{3,}", "\n\n", result)42ee158 to
94086d3
Compare
…previous-chat-completions-requests-parameters-to-responses-requests-ones Resolve conflicts: - ai4rag/__init__.py: Use version 0.8.2 from main - pyproject.toml: Use docling-core 2.83.1 from main - pattern_builder.py: Remove detected_language parameter (handled in PR #81), keep PR #77 responses_template structure - uv.lock: Regenerate from main Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor
- Replace language_autodetect=False with language='English' - Update test expectations to match unified RAG instruction format - Remove checks for deprecated model-specific fragments Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor
Remove extra blank line between variable assignment and dict creation. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor
Extract 260 lines of prompt filtering logic from pattern_builder.py into new prompt_filters.py module for better separation of concerns. Changes: - Create ai4rag/components/assets_generator/prompt_filters.py - Move all OGX deduplication constants and functions - Add comprehensive module docstring - Make functions public (remove underscore prefixes) - Update pattern_builder.py (487 → 316 lines, -35%) - Import filtering functions from prompt_filters - Remove duplicated filtering logic - Focus on pattern building responsibilities - Update tests to import from new module Benefits: - Clear separation: pattern building vs text filtering - Better maintainability: OGX phrase lists in one place - Easier testing: filter logic independently testable - Reduced cognitive load when reviewing pattern code All 35 tests passing. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor
|
@Mateusz-Switala updates were made - please re-review. |
@LukaszCmielowski It seems that not all suggestions from #77 (comment) have been applied or addressed. |
Critical fixes:
- Remove legacy template support code (only support {reference_documents} format)
- Fix _system_has_grounding_policy to use sentence-level startswith instead of substring
Prevents false positives like "All documents do not contain PII"
- Add None guards for temperature and max_completion_tokens
Prevents sending null values to Responses API
- Replace while loop with single-pass regex in prompt_filters.py
Test coverage:
- Add test_omits_temperature_when_none
- Add test_omits_max_output_tokens_when_none
- Add test_system_grounding_detection_no_false_positive_on_embedded_substring
- Update test_extract_static_user_pure_text_no_slots for modern templates only
- Rename test_build_responses_system_input_legacy_user_prefix to
test_build_responses_system_input_strips_ogx_prefix
All 38 tests passing. Black formatting passing.
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
|
@Mateusz-Switala noticed that - the commit above should address it |
|
@LukaszCmielowski I see that problem with |
Change headings metadata from list to string format using " > " separator. This makes the metadata more readable and consistent with other formats. Before: metadata["headings"] = ["Chapter 1", "Section 1.1"] After: metadata["headings"] = "Chapter 1 > Section 1.1" Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor
Update tests to handle headings as string ("A > B") instead of list.
Extract first heading from the string using .split(" > ")[0].
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
Applied 5 fixes from Mateusz's review comments:
1. Fix tool_choice schema (CRITICAL)
- Changed from: {"mode": "required", "tools": [{}], "type": "file_search"}
- Changed to: {"type": "file_search"}
- Matches OGX SDK authoritative type definition
2. Update test assertion for tool_choice
- Updated test to match corrected schema
3. Remove citation constant duplication (DRY)
- Import _RAG_CITATION_INSTRUCTION from model_props.py
- Remove duplicate HPO_CITATION_INSTRUCTION
- Single source of truth for citation instruction
4. Document prefix constants
- Add inline comments explaining usage of each constant group
- Documents two-pass filtering architecture
- Helps contributors understand which list to update
5. Fix normalize_answer_scaffold regex
- Use regex to handle all comma orderings
- Handles both ", with citations" and "with citations," correctly
- No orphaned commas in output
All 89 tests passing.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
…previous-chat-completions-requests-parameters-to-responses-requests-ones
|

Description
Per each generated rag pattern there is also a ready to use out-of-the-box payload for Responses API requests. In order for it to be reliable it has to be strictly aligned with what's being sent during each iteration of the AutoRAG optimization process to Chat/Completion endpoint.
Motivation
Changes
responses_templatekey in each genrated rag patternTesting
How did you test this?
Checklist